101 research outputs found

    Replication for send-deterministic MPI HPC applications

    Get PDF
    International audienceReplication has recently gained attention in the context of fault tolerance for large scale MPI HPC applications. Existing implementations try to cover all MPI codes and to be independent from the underlying library. In this paper, we evaluate the advantages of adopting a different approach. First, we try to take advantage of a communication property common to many MPI HPC application, namely send-determinism. Second, we choose to implement replication inside the MPI library. The main advantage of our approach is simplicity. While being only a small patch to the Open MPI library, our solution called SDR-MPI supports most main features of the MPI standard including all collectives and group operations. SDR-MPI additionally achieves good performance: Experiments run with HPC benchmarks and applications show that its overhead remains below 5%

    Towards Highly Available and Self-Healing Grid Services

    Get PDF
    The volatility of nodes in large scale distributed systems endangers the availability of grid services and makes them difficult to use. In such a context, structured peer-to-peer overlays can be used to provide scalable and fault tolerant communication mechanisms. To ensure the availability of services, active replication can be used on top of the overlays. In this paper, we present Semias, a framework that is based on active replication on top of a structured overlay to provide high availability and self-healing for stateful grid services. The self-healing mechanisms of Semias ensure the high availability of the replicated services while minimizing the number of reconfigurations. We have used Semias to make Vigne grid middleware services highly available. Experiments run on Grid'5000 and PlanetLab show the performance and self-healing properties of the framework

    Semias: A Framework for Highly Available and Self-Healing Services in Large Scale Dynamic Distributed Systems

    Get PDF
    Next generation HPC systems will be large scale distributed systems spread over wide area networks. Overlays are used in those systems to provide scalable and fault tolerant communication mechanisms. In such a context, providing highly available services to users is challenging. In this paper, we present Semias, a framework that provides stateful services with high availability and self-healing. Based on active replication on top of a structured overlay, Semias requires very few modifications of existing services. Semias self-healing mechanisms are designed to minimize the number of reconfigurations of replicated services while ensuring high availability. We have used Semias to make Vigne grid middleware services highly available. Experiments run on the Grid'5000 testbed show the performance and self-healing properties of the framework

    Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach

    Get PDF
    International audienceNowadays, NUMA architectures are common in compute-intensive systems. Achieving high performance for multi-threaded application requires both a careful placement of threads on computing units and a thorough allocation of data in memory. Finding such a placement is a hard problem to solve, because performance depends on complex interactions in several layers of the memory hierarchy. In this paper we propose a black-box approach to decide if an application execution time can be impacted by the placement of its threads and data, and in such a case, to choose the best placement strategy to adopt. We show that it is possible to reach near-optimal placement policy selection. Furthermore, solutions work across several recent processor architectures and decisions can be taken with a single run of low overhead profiling

    The Architecture of the XtreemOS Grid Checkpointing Service

    Get PDF
    The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in a distributed heterogeneous environment. The latter may spawn millions of grid nodes using different system-specific checkpointers saving and restoring application and kernel data structures on a grid node. In this paper we present the architecture of the XtreemGCP service integrating existing checkpointing solutions. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We propose to bridge the gap between grid semantics and system-specific checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Furthermore, we discuss other grid related checkpointing issues including resource conflicts during restart, security, and checkpoint file management. Although this paper presents a solution within the XtreemOS context it can be applied to any other grid middleware or distributed OS, too

    Beyond The Cloud, How Should Next Generation Utility Computing Infrastructures Be Designed?

    Get PDF
    To accommodate the ever-increasing demand for Utility Computing (UC) resources, while taking into account both energy and economical issues, the current trend consists in building larger and larger data centers in a few strategic locations. Although such an approach enables to cope with the actual demand while continuing to operate UC resources through centralized software system, it is far from delivering sustainable and efficient UC infrastructures. We claim that a disruptive change in UC infrastructures is required: UC resources should be managed differently, considering locality as a primary concern. We propose to leverage any facilities available through the Internet in order to deliver widely distributed UC platforms that can better match the geographical dispersal of users as well as the unending demand. Critical to the emergence of such locality-based UC (LUC) platforms is the availability of appropriate operating mechanisms. In this paper, we advocate the implementation of a unified system driving the use of resources at an unprecedented scale by turning a complex and diverse infrastructure into a collection of abstracted computing facilities that is both easy to operate and reliable. By deploying and using such a LUC Operating System on backbones, our ultimate vision is to make possible to host/operate a large part of the Internet by its internal structure itself: A scalable and nearly infinite set of resources delivered by any computing facilities forming the Internet, starting from the larger hubs operated by ISPs, government and academic institutions to any idle resources that may be provided by end-users. Unlike previous researches on distributed operating systems, we propose to consider virtual machines (VMs) instead of processes as the basic element. System virtualization offers several capabilities that increase the flexibility of resources management, allowing to investigate novel decentralized schemes.Afin de supporter la demande croissante de calcul utilitaire (UC) tout en prenant en compte les aspects Ă©nergĂ©tique et financier, la tendance actuelle consiste Ă  construire des centres de donnĂ©es (ou centrales numĂ©riques) de plus en plus grands dans un nombre limitĂ© de lieux stratĂ©giques. Cette approche permet sans aucun doute de satisfaire la demande tout en conservant une approche centralisĂ©e de la gestion de ces ressources mais elle reste loin de pouvoir fournir des infrastructures de calcul utilitaire efficaces et durables. AprĂšs avoir indiquĂ© pourquoi cette tendance n'est pas appropriĂ©e, nous proposons au travers de ce rapport, une proposition radicalement diffĂ©rente. De notre point de vue, les ressources de calcul utilitaire doivent ĂȘtre gĂ©rĂ©es de maniĂšre Ă  pouvoir prendre en compte la localitĂ© des demandes dĂšs le dĂ©part. Pour ce faire, nous proposons de tirer parti de tous les Ă©quipements disponibles sur l'Internet afin de fournir des infrastructures de calcul utilitaire qui permettront de part leur distribution de prendre en compte plus efficacement la dispersion gĂ©ographique des utilisateurs et leur demande toujours croissante. Un des aspects critique pour l'Ă©mergence de telles plates-formes de calcul utilitaire ''local'' (LUC) est la disponibilitĂ© de mĂ©canismes de gestion appropriĂ©s. Dans la deuxiĂšme partie de ce document, nous dĂ©fendons la mise en oeuvre d'un systĂšme unifiĂ© gĂ©rant l'utilisation des ressources Ă  une Ă©chelle sans prĂ©cĂ©dent en transformant une infrastructure complexe et hĂ©tĂ©rogĂšne en une collection d'Ă©quipements virtualisĂ©s qui seront Ă  la fois plus simples Ă  gĂ©rer et plus sĂ»rs. En dĂ©ployant un systĂšme de type LUC sur les coeurs de rĂ©seau, notre vision ultime est de rendre possible l'hĂ©bergement et la gestion de l'Internet sur sa propre infrastructure interne: un ensemble de ressources extensible et quasiment infini fourni par n'importe quel Ă©quipement constituant l'Internet, partant des gros noeud rĂ©seaux gĂ©rĂ©s par les ISPs, les gouvernements et les institutions acadĂšmiques jusqu'Ă  n'importe quelle ressource inactive fournie par les utilisateurs finaux. Contrairement aux approches prĂ©cĂ©dentes appliquĂ©es aux systĂšmes distribuĂ©s, nous proposons de considĂ©rer les machines virtuelles comme la granularitĂ© Ă©lĂ©mentaire du systĂšme (Ă  la place des processus). La virtualisation systĂšme offre plusieurs fonctionnalitĂ©s qui amĂ©liorent la flexibilitĂ© de la gestion de ressources, permettant l'Ă©tude de nouveaux schĂ©mas de dĂ©centralisation

    Muscle activation during gait in children with Duchenne muscular dystrophy

    Get PDF
    The aim of this prospective study was to investigate changes in muscle activity during gait in children with Duchenne muscular Dystrophy (DMD). Dynamic surface electromyography recordings (EMGs) of 16 children with DMD and pathological gait were compared with those of 15 control children. The activity of the rectus femoris (RF), vastus lateralis (VL), medial hamstrings (HS), tibialis anterior (TA) and gastrocnemius soleus (GAS) muscles was recorded and analysed quantitatively and qualitatively. The overall muscle activity in the children with DMD was significantly different from that of the control group. Percentage activation amplitudes of RF, HS and TA were greater throughout the gait cycle in the children with DMD and the timing of GAS activity differed from the control children. Significantly greater muscle coactivation was found in the children with DMD. There were no significant differences between sides. Since the motor command is normal in DMD, the hyper-activity and co-contractions likely compensate for gait instability and muscle weakness, however may have negative consequences on the muscles and may increase the energy cost of gait. Simple rehabilitative strategies such as targeted physical therapies may improve stability and thus the pattern of muscle activity

    On the Dark Side of Therapies with Immunoglobulin Concentrates: The Adverse Events

    Get PDF
    Therapy by human immunoglobulin G (IgG) concentrates is a success story ongoing for decades with an ever increasing demand for this plasma product. The success of IgG concentrates on a clinical level is documented by the slowly increasing number of registered indication and the more rapid increase of the off-label uses, a topic dealt with in another contribution to this special issue of Frontiers in Immunology. A part of the success is the adverse event (AE) profile of IgG concentrates which is, even at life-long need for therapy, excellent. Transmission of pathogens in the last decade could be entirely controlled through the antecedent introduction by authorities of a regulatory network and installing quality standards by the plasma fractionation industry. The cornerstone of the regulatory network is current good manufacturing practice. Non-infectious AEs occur rarely and mainly are mild to moderate. However, in recent times, the increase in frequency of hemolytic and thrombotic AEs raised worrying questions on the possible background for these AEs. Below, we review elements of non-infectious AEs, and particularly focus on hemolysis and thrombosis. We discuss how the introduction of plasma fractionation by ion-exchange chromatography and polishing by immunoaffinity chromatographic steps might alter repertoire of specificities and influence AE profiles and efficacy of IgG concentrates

    Services et protocoles pour l'exécution fiable d'applications distribuées dans les grilles de calcul

    No full text
    A grid gathers a large amount of heterogeneous computing resources, belonging to various administrative domains. Grids are attractive because they can provide users with the amount of computing resources needed to execute scientific applications. However, executing applications in a grid is challenging because the failure rate is high. To execute applications reliably in the grid, we first propose a rollback recovery service in charge of automatically restarting failed applications. Then we propose a framework to provide grid services with high availability and self-healing. Finally, we propose a scalable rollback-recovery protocol for message passing applications.Une grille de calcul regroupe un trÚs grand nombre de ressources de calcul hétérogÚnes, pouvant appartenir à différents domaines d'administration. Les grille sont attractives car elles peuvent fournir à leurs utilisateurs les ressources nécessaires à l'exécution d'applications de calcul scientifique. Cependant exécuter une application sur la grille est une tùche difficile car la fréquence des défaillances matérielles y est élevés. Pour assurer l'exécution fiable d'applications distribuées dans les grilles de calcul, nous proposons tout d'abord un service de recouvrement arriÚre assurant le redémarrage automatique des applications défaillantes. Nous proposons ensuite une solution assurant la haute disponibilité et l'auto-réparation de services de grille. Enfin nous proposons un protocole de recouvrement arriÚre pour application à échange de messages passant à l'échelle
    • 

    corecore